AITopics | note density

Collaborating Authors

note density

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

Huang, Yujia, Ghatare, Adishree, Liu, Yuanzhe, Hu, Ziniu, Zhang, Qinsheng, Sastry, Chandramouli S, Gururani, Siddharth, Oore, Sageev, Yue, Yisong

arXiv.org Artificial IntelligenceJun-2-2024

We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.

guidance, music, note density, (11 more...)

arXiv.org Artificial Intelligence

2402.14285

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California (0.04)

Genre: Research Report (0.63)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploring Variational Auto-Encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI

Bryan-Kinns, Nick, Zhang, Bingyuan, Zhao, Songyan, Banar, Berker

arXiv.org Artificial IntelligenceNov-14-2023

Generative AI models for music and the arts in general are increasingly complex and hard to understand. The field of eXplainable AI (XAI) seeks to make complex and opaque AI models such as neural networks more understandable to people. One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models. This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE), configurations of latent space in the AI model (from 4 to 256 latent dimensions), and training datasets (Irish folk, Turkish folk, Classical, and pop) have on music generation performance when 2 or 4 meaningful musical attributes are imposed on the generative model. To date there have been no systematic comparisons of such models at this level of combinatorial detail. Our findings show that MeasureVAE has better reconstruction performance than AdversarialVAE which has better musical attribute independence. Results demonstrate that MeasureVAE was able to generate music across music genres with interpretable musical dimensions of control, and performs best with low complexity music such a pop and rock. We recommend that a 32 or 64 latent dimensional space is optimal for 4 regularised dimensions when using MeasureVAE to generate music across genres. Our results are the first detailed comparisons of configurations of state-of-the-art generative AI models for music and can be used to help select and configure AI models, musical features, and datasets for more understandable generation of music.

dataset, dimension, latent space, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11633-023-1457-1

2311.08336

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.85)

Add feedback

Exploring XAI for the Arts: Explaining Latent Space in Generative Music

Bryan-Kinns, Nick, Banar, Berker, Ford, Corey, Reed, Courtney N., Zhang, Yixiao, Colton, Simon, Armitage, Jack

arXiv.org Artificial IntelligenceAug-10-2023

Explainable AI has the potential to support more interactive and fluid co-creative AI systems which can creatively collaborate with people. To do this, creative AI models need to be amenable to debugging by offering eXplainable AI (XAI) features which are inspectable, understandable, and modifiable. However, currently there is very little XAI for the arts. In this work, we demonstrate how a latent variable model for music generation can be made more explainable; specifically we extend MeasureVAE which generates measures of music. We increase the explainability of the model by: i) using latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes, ii) providing a user interface feedback loop to allow people to adjust dimensions of the latent space and observe the results of these changes in real-time, iii) providing a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions. We suggest that in doing so we bridge the gap between the latent space and the generated musical outcomes in a meaningful way which makes the model and its outputs more explainable and more debuggable. The code repository can be found at: https://github.com/bbanar2/

dimension, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.05496

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.87)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.87)

Add feedback

YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations

Choi, Eunjin, Chung, Yoonjin, Lee, Seolhee, Jeon, JongIk, Kwon, Taegyun, Nam, Juhan

arXiv.org Artificial IntelligenceNov-14-2022

Existing multi-instrumental datasets tend to be biased toward pop and classical music. In addition, they generally lack high-level annotations such as emotion tags. In this paper, we propose YM2413-MDB, an 80s FM video game music dataset with multi-label emotion annotations. It includes 669 audio and MIDI files of music from Sega and MSX PC games in the 80s using YM2413, a programmable sound generator based on FM. The collected game music is arranged with a subset of 15 monophonic instruments and one drum instrument. They were converted from binary commands of the YM2413 sound chip. Each song was labeled with 19 emotion tags by two annotators and validated by three verifiers to obtain refined tags. We provide the baseline models and results for emotion recognition and emotion-conditioned symbolic music generation using YM2413-MDB.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2211.07131

Country:

Asia > South Korea (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.51)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

Tan, Hao Hao, Herremans, Dorien

arXiv.org Machine LearningJul-29-2020

High-level musical qualities (such as emotion) are often abstract, subjective, and hard to quantify. Given these difficulties, it is not easy to learn good feature representations with supervised learning techniques, either because of the insufficiency of labels, or the subjectiveness (and hence large variance) in human-annotated labels. In this paper, we present a framework that can learn high-level feature representations with a limited amount of data, by first modelling their corresponding quantifiable low-level attributes. We refer to our proposed framework as Music FaderNets, which is inspired by the fact that low-level attributes can be continuously manipulated by separate "sliding faders" through feature disentanglement and latent regularization techniques. High-level features are then inferred from the low-level representations through semi-supervised clustering using Gaussian Mixture Variational Autoencoders (GM-VAEs). Using arousal as an example of a high-level feature, we show that the "faders" of our model are disentangled and change linearly w.r.t. the modelled low-level attributes of the generated output music. Furthermore, we demonstrate that the model successfully learns the intrinsic relationship between arousal and its corresponding low-level attributes (rhythm and note density), with only 1% of the training set being labelled. Finally, using the learnt high-level feature representations, we explore the application of our framework in style transfer tasks across different arousal states. The effectiveness of this approach is verified through a subjective listening test.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Machine Learning

2007.15474

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)

Add feedback